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1. Introduction 



1.1. Bayesian Inversion W 



n 



Consider the problem of finding u £ R™ from y £ R J where u and y are related by the equation 



y = G(u). 



We refer to y as observed data and to u as the unknown. This problem may be difficult for a number of 
reasons. We highlight two of these, both particularly relevant to our future developments. 

1. The first difficulty, which may be illustrated in the case where n = J, concerns the fact that often the 
equation is perturbed by noise and so we should really consider the equation 



where r\ £ R represents the observational noise which enters the observed data. It may then be the 
case that, because of the noise, y is not in the image of G so that simply inverting G on the data y 
will not be possible. Furthermore, the specific instance of i] which enters the data may not be known 
to us; typically, at best, only the statistical properties of a typical noise rj arc known. Thus we cannot 
subtract r\ from the observed data y to obtain something in the image of G. 
2. The second difficulty is manifest in the case where n > J so that the system is underdetermined: the 
number of equations is smaller than the number of unknowns. How do we attach a sensible meaning 
to the concept of solution in this case where, generically, there will be many solutions? 

Thinking probabilistically enables us to overcome both of these difficulties. We will treat u, y and 77 as 
random variables and define the "solution" of the inverse problem to be the probability distribution of u 
given y, denoted u\y. This allows us to model the noise via its statistical properties, even if we do not know 
the exact instance of the noise entering the given data. And it also allows us to specify a priori the form of 
solutions that we believe to be more likely, thereby enabling us to attach weights to multiple solutions which 
exaplain the data. This is the Bayesian approach to inverse problems. 

To this end, we define a random variable (u, y) £ R" x R J as follows. We let u £ R™ be a random 
variable with (Lebesgue) density po(u). Assume that y\u (y given u) is defined via the formula (1.1) where 
G : R™ — > R J is measurable, and r\ is independent of u (we sometimes write this as r\ _L u) and has Lebesgue 
density p(rj). Then (u, y) £ R™ x R J is a random variable with Lebesgue density p(y — G(u))po(u). 
The following theorem allows us to calculate the distribution of the random variable u\y: 

Theorem 1.1. Bayes' Theorem. Assume that 



Remarks 1.2. The following remarks establish the nomenclature of Bayesian statistics, and also frame the 
previous theorem in a manner which generalizes to the infinite dimensional setting. 

• po(u) is the prior density. 

• p(y — G(it)) is the likelihood. 

• p v {u) is the posterior density. 

• It will be useful in what follows to define 



y = G{u) + n, 



(1.1) 




Then u\y is a random variable with Lebesgue density p v [u) given by 



p v (u) = -p(y-G(u))p (u). 



${u;y) 



log p(y-G(u)). 



We call $ the potential. This is the negative log likelihood. 
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• Let fj, v be measure on K™ with density p v and fio measure on 1™ with density po . Then the conclusion 
of Theorem 1.1 may be written as: 

dfi y 1 , . 

d^o Z (l2) 



cxp ( - $(u;y))(i (du). 

Thus the posterior is absolutely continuous with respect to the prior, and the Radon-Nikodym derivative 
is proportional to the likelihood. The expression for the Radon-Nikodym derivative is to be interpreted 
as the statement that, for all measurable f : W 1 — > R ; 

p7(i) = r(f( M )/(«)). 

V apo ) 

Alternatively we may write this in integral form as 

J f{u)p y {du) = J ^ (| cxp(-$( u; y))f{u))p Q (du). 

1.2. Inverse Heat Equation 

This inverse problem illustrates the first difficulty, labelled 1. in the previous subsection, which motivates 
the Bayesian approach to inverse problems. Let D C M d be a bounded open set, with smooth boundary dD. 
Then define the Hilbert space H and operator A as follows: 

H= (L\D), (;■), ||. ||); 
A = -A, D(A) = H 2 {D)C\Hl(D). 
Lemma 1.3. The eigenvalue problem 

Aifj = oijifj, 

has a countably infinite set of solutions, indexed by j S Z + , and satisfying the 1? — orthonormality condition 



(<Pj,<Pk) 



1, j = k 
0, j + k. 



Furthermore, the eigenvalues are positive and, if ordered to be increasing, satisfy ctj x j*. 

Consider the heat conduction equation on D, with Dirichlct boundary conditions, writing it as an ordinary 
differential equation in H: 

dv 

— + Av = 0, v{0)=u. (1.3) 

We have the following: 

Lemma 1.4. For every a£ H there is a unique solution u of equation (1.3) in the space u £ C([0, oo); H). 
Note that, if the initial condition is expanded in the eigenbasis as 

oo 

U = ^2ujipj, Uj = {u,ipj) 
3 = 1 

then the solution of (1.3) has the form 

oo 



We will be interested in the inverse problem of finding u from y where 

V = v(l)+T] 

= G(u) + T). 

Here 77 G H is noise and G{u) := v(l) = e~ A u. Formally this looks like an infinite dimensional linear version 
of the inverse problem (1.1), extended from finite dimensions to a Hilbcrt space setting. However the operator 
e A : H —> H is not continuous and so we need regularization to make sense of the problem. Thus, if the 
noise 77 G H, it will not be possible to simply apply G _1 to y, difficulty 1. from the preceding subsection. 
We will apply a Bayesian approach and hence will need to put probability measures on the Hilbert space H; 
in particular we will want to study P(u), ¥(y\u) and ¥(u\y), all probability measures on H. 

1.3. Elliptic Inverse Problem 

One motivation for adopting the Bayesian approach to inverse problems is that prior modelling is a trans- 
parent approach to dealing with under-determined inverse problems; it forms a rational approach to dealing 
with the second difficulty, labelled 2. in the previous subsection. The elliptic inverse problem we now describe 
is a concrete example of an under-determined inverse problem. 

As in Section 1.2, D C M d denotes a bounded open set, with smooth boundary dD. We define the Hilbcrt 
spaces (Gelfand triple) V C H C V* as follows: 

H =(L*(D), {;■), II- ll); 

V = Hl(D) with norm || ■ \\ v = ||V ■ ||; 
V* dual space; 

I ' II 5! Cp || ■ I v (Poincarc inequality). 

Let k e X := L°°{D) satisfy 

ess inf k(x) = K min > 0. (1.4) 

x£D 

Now consider the equation 

- V ■ («Vp) = /, x G D, (1.5a) 

p = 0, x G dD. (1.5b) 

Lemma 1.5. Assume that f G V* and that k satisfies (1.4). Then (1.5) has a unique weak solution p G V. 
This solution satisfies 

IMIv < 11/11 V*/Kmin 

and, if f G H , 

\\p\\v < Cp||/||/Knun- 

We will be interested in the inverse problem of finding k from y where 

Vj=lj(p)+Vj, 3 = !,•••, J- (1-6) 

Here lj G V* is a continuous linear functional on V and rjj is a noise. 

Notice that the unknown, k G L°°(D), is a function (infinite dimensional) whereas the data from which 
we wish to determine k is finite dimensional: y G K J . The problem is severely under-determined, illustrating 
point 2. from the previous subsection. It is natural to treat such problems in the Bayesian framework, using 
prior modeling to fill-in missing information. We will take the unknown function to be u where either u = n 
or u — \ogK. In either case, we will define Gj(u) = lj(p) and then (1.6) may be written as 

y = G(u) + v (1.7) 
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where y,n £ M J and G : X' C X — > R J . The set X 1 is introduced because G may not be defined on the 
whole of X. In particular, the positivity constraint (1.4) is only satisfied on 

X' := [u £ X : ess inf u(x) > o] C X 

I x£D J 

in the case where k = u. On the other hand if k — cxp(u) then the positivity constraint (1.4) is satisfied for 
any u £ X. 

Notice that we again need probability measures on function space, here the Banach spaceX = L°°(D). Fur- 
thermore, these probability measures should charge only positive functions, in view of the desired inequality 
(1.4). 



2. Prior Modeling 
2.1. General Setting 

We let {4>j}JLo denote an infinite sequence in the Banach space (X, || • ||) of E- valued funct ions defined on 
D C K d , a bounded, open set with smooth boundary. (The extension to R ra — valued functions is straightfor- 
ward, but omitted for brevity). We normalize these functions so that \\4> 3 || = 1 for j = 1, • • • , oo; we do not 
assume that 4>q is normalized. Define the function u by 

oo 

u = <f>o +y^Uj(/>j. (2.1) 

3=1 

By randomizing u := {uj}j c L 1 we create random functions. To this end we define the deterministic sequence 
7 = { r )j}J^ l and the i.i.d. random sequence £ = and set Uj = JjCj- We let (fl,J-, FJ denote 

the probability space for the i.i.d. sequence £ £ fl = K°°, with E denoting expectation. In the next three 
subsections we demonstrate how this general setting may be adapted to create a variety of useful prior 
measures on function space. On occasion we will find it useful to consider the truncated random functions 

N 

U N = <t>0 +y^j</ > 3; u 3=ljij- ( 2 - 2 ) 



2.2. Uniform Priors 

Choose X — L°°{D). Assume Uj = with £ = an i.i.d. sequence with £i ~ U[—l, 1] and 

7 = {'Yj^jZ-i £ £■ ■ Assume further that there are finite, positive constants </> m i n , (^ max , S > such that 

ess inf (j> (x) > min ; 
ess sup <f> (x) < (/) max ; 

x£D 

Theorem 2.1. The following holds P— almost surely: the sequence of functions {u N }^ =1 given by (2.2) is 
Cauch in X and the limiting function u given by (2.1) satisfies 

: ; x 0min < u(x) < ^) max + y r ^min 0,.e. X £ D. 
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Proof. Let N > M. Then, P-a.s., 

N 

\\u N -u M \\ 00 = \\ Y, u ih 

j=M+l 
N 

- 1 £ 

j=M+l 
oo 

< E ^IIC-lll^lloo 
j=M+l 

OO 

< E l7jl- 

j=M+l 

The right hand side tends to zero as M — > oo by the dominated convergence theorem and hence the sequence 
is Cauchy in X. 

We have P— a.s. and for a.e. x £ D, 



i(x) > 0o(^)-EKHI^I 



> ess inf 4>o(x) — \ j 

> 0min - H7IU 1 

1 



1 + 



Proof of the upper bound is similar. □ 



Example Consider the random function (2.1) as specified in this section. By Theorem 2.1 we have that, 
P-a.s., 

u(x) > r(/) m in > 0, a.e. x £ D. (2.3) 

l + o 

Set k = u in the elliptic equation (1.4), so that the coefficient k in the equation and the solution p are 
random variables on ^fi, J 7 , P^ . Since (2.3) holds P— a.s., Lemma 1.5 shows that, again P— a.s., 

Ibllv < (l+^)ll/lk*/0mi„- 

Since the r.h.s. is non-random we have that for all r £ Z + the random variable p £ Lp(tt; V): 

E||p||v < 00. 

In fact Eexp(a||p|jy) < 00 for all r <S Z + and a £ (0,oo). □ 
2.3. Besov Priors 

Now we set 4>o = and let {<j>j}j? =1 be an orthonormal basis for X. Let 

X := L 2 {T d ) = \u I \u{x)\ 2 dx < 00, / u{x)dx = o} 

L Jfd Jjd > 

for d < 3 with inner-product and norm denoted by (•, •) and |j • |j respectively. Then, for any u £ X, we have 

00 

u ( x ) = E u i = ( u > ^')- ( 2 ' 4 ) 

3=1 
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Given a function written in this form, we define the Banach space X t,q by 

\\u\\ x ,, = {Y.^ +i ~ 1) \^\ q Y 

3=1 

with q > 1 and s > 0. If {<fij} form the Fourier basis and q = 2 then X*' 2 is the Sobolev space H* of 
mean-zero periodic functions with t (possibly non-integer) square-integrable derivatives. On the other hand, 
if the {<pj} form certain wavelet bases, then X t,q is the Besov space B^ q . 

Now we set <f>o = and let {4>j}fLi be an orthonormal basis for X and consider (2.1). As described above, 
we assume that Uj = jj^j where £ = is i.i.d. sequence and 7 = {j^fLi is deterministic. Here we 

assume that £1 is drawn from the centred measure on R with density proportional to exp ( — ^|a;| 9 ) for some 
1 < q < 00. Then for s > 0, S > we define 

7j . =J -(i+a-$)(_)i. 

Then for functions of the form (2.1) we have 
Theorem 2.2. T/ie following are equivalent 
i) WuWxt.i < 00 P— a.s.; 

iij E(exp(a||u||^- t ,,)) < 00 for any a <E [0, |); 
iii)t<s-^. 

Proof. We first note that, for the random function in question, 

00 00 

Nik.. - E^ +f ~ 1} N 9 = E^r^f . 

3=1 3 = 1 

Now, for a < |, 




= (l-2a)"«. 



iii) => ii). 

00 

E(exp(a]|«||«..«)) = E(cxp(aE^ 1 r^l^ 9 )) 

3=1 

oc n 1 

( x ~ T J ) ' 

3=1 

Since a < § the product converges if ^ s ~ j ^ <? > 1 i.e. i < s — | as required, 
ii) i). 

This is automatic since, for any random variable u, and any positive function /, E/(it) < 00 /(it) < 00 
a.s. 

i) =)> iii). 

To show that (i) implies (iii) note that (i) implies that, almost surely, 

00 

J2j (t - s)q/ %\ q <oc. 
3=1 
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Define Q = j(*- s )«A% \i. Using the fact that the Q are non-negative and independent we deduce from 
Lemma 2.3 that 



£E(G A 1) = J2 E (j {t ~ S)9/d \^\ q A lj < oo. 

3=1 3=1 

This implies that t < s. We note that then 

eg = E(r (s - t)9/d iGf) 

= E(i-(— *)«/ d |^| 9 I{|€, |<iC-«)/-}) + E(i- C "-* ),/ci |^l 9 I{| & |>iC-.)/-. } ) 

< E ((CiM)l{|y<iM/n)+^ 

< e(Cj-ai)+/ : 



where 

/•oo 

Jj(s-t)/d 

Noting that, since q > 1, the function x h-> x q e~ x "/ 2 is bounded, up to a constant of proportionality, by the 
function x H> e~ Qa: for any a < i, we see that there is a positive constant K such that 

/>oo 

J < Kj- { - s ' t ^/ d / e^dx 

Jj(s-t)/d 

a 

:= Li. 



Thus we have shown that 



oc oo oo 

^{r is - t)q/ %\ q ) < E E (o ai) +J> < °°- 



3=1 j=i i=i 

Since the G are i.i.d. this implies that 

fy*-*/* < oo, 

from which it follows that (s — t)q/d > 1 and (iii) follows. □ 

Lemma 2.3. Let {Lj}° c L 1 be an independent sequence of R + — valued random variable. Then 

oo oo 

i=i j=i 



2-4- Gaussian Priors 

Let X be a Hilbert space T-L with inner-product and norm denoted by (•,•) and || • |j respectively. Assume 
that {4>j}J^i is an orthonormal basis for H. Define 

oo 
3=1 
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As in the Section 2.3, we consider the setting in which fa = so that function u is given by (2.4). We 
choose £i ~ Af(0, 1) and x j"3. We are interested in convergence of the following series, found from (2.2) 
with 0o = 0: 

jv 

u N = Y^ u j<t>j » u i = 7i& ■ ( 2 - 6 ) 
To understand this sequence of functions, indexed by N, it is useful to introduce the following function space: 

l|(ft;H*) := |u : fi x D -> M E|H|^ ( < ooj. 

This is in fact a Hilbert space. 

Theorem 2.4. XTie sequence of functions {u N }^ =1 is Cauchy in the Hilbert space Lp(Q;'H t ) , t < s — ^. 
TTims i/ie infinite series 

oo 

=^"^(1). = 7?£j ( 2 - 7 ) 

exists as an L 2 — limit and takes values in for t < s — ^. 
Proof. For TV > M, 

N 

E\\u N -u M f nt . X j-Kf 

j=JVf+l 

JV oo 

E 2(t-s) V -< 2(t- 3 ) 

j d < 2^ J d ■ 

j=Af+l j=M+l 

The sum on the right hand side tends to as M — >■ oo, provided < — 1, by the dominated convergence 

theorem. This completes the proof. □ 

Remarks 2.5. We make the following remarks concerning the Gaussian random functions constructed in 
the preceding theorem. 

• The preceding theorem shows that the sum (2.6) has an Lp limit in TL* when t < s — d/2. The same 
methods used to prove Theorem 2.2 show that the sum also has an almost sure limit in when 
t < s — d/2. Indeed, for t < s — |, 

oo 

i=i 

OO 

i=i 

oo 

E 2(t-s) 
J d < oo. 

Thus uen u a.s., i < s - f . 

• From the preceding theorem we see that, provided s > i, the random function in (2.7) generates a mean 
zero Gaussian measure on %. The expression (2.7) is known as the Karhuncn-Loeve expansion, and 
the eigenf unctions {</>j}^i as the Karhunen-Loeve basis. 
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• The following formal calculation gives an expression for the covariance operator: 

C = Eu(x) ® u(x) 



oo oo 

J=l fe=l 

OO OO 



OO oo 

1j1k£.j£.k<t>j{x) ® 0fc(a;) 



3=1 fc=i 



OO OO 



j=i fc=i 

OO 

From this expression for the covariance, we may find eigenpairs explicitly: 

oo 

C(f>k = (Y, Tjti ( x ) ® ^ ( x )) ^ 
3=1 

OO OO 

= H lj{4>j,<t>k)4>j = Y 7|<Wfc = 7fc0fc- 
3=1 3=1 

T/ie Gaussian measure is denoted 7V(0,C) and £/ie eigenf unctions ofC, {<Pj}j^i> are the Karhunen- 
Loeve basis for measure The 7? are £/ie eigenvalues associated with this eigenbasis, and thus jj is 
the standard deviation of the Gaussian measure in the direction <pj . 



Example In the case where T-L = L 2 (T d ) we are in the setting of Section 2.3. Furthermore, we now assume 
that the {4>j}'jLi constitute the Fourier basis. It then follows that "H* = if*(T d ), the Sobolev space of periodic 
functions on [0, l) d with mean zero and t (possibly negative or fractional) square integrable derivatives, 
denoted by if*. Thus u e if* a.s., t < s — |. 

A commonly arising choice of prior covariance operator is C — (A)~ a with A = — A, D(A) = H 2 (T d ). 
It then follows, analogously to the result of Lemma 1.3 in the case of Dirichlct boundary conditions, that 
7? x 3~~^~ ■ Thus s — a and u € if*, t < a — |. As a result, for any t < a — |, it is possible to view the 
resulting Gaussian measure as defined on the Hilbcrt space if*. In fact, by use of the Kolmogorov continuity 
theorem, the Gaussian measures may also be defined on Holder spaces C ' 1 , for t < a — 4, if a — i G (0, 1) 
and C r < e with r = [a - f J , e = a - f - r £ (0, 1). □ 

The previous example illustrates the fact that, although we have constructed Gaussian measures in a 
Hilbert space setting, they may also be defined on Banach spaces, such as the space of Holder continuous 
functions. The following theorem then applies. 

Theorem 2.6. Fernique Theorem Let fia be a Gaussian measure on the separable Banach space X . Then 
there exists j3 c € (0, 00) such that, for all j3 € (0, f3 c ) 

E M exp < °°- 

Remark 2.7. Theorem 2.2 establishes this result in the case covered by the preceding example, for X = 
X l ' s = H l , if no = A/"(0,A" Q ) andt <a - f . 

Example Consider the random function (2.1) in the case where T-L = L 2 (T d ) and fin = Af(0, A~ a ), a > 5 
as in the preceding example. Then we know that u £ C aA , t < (a - f) A 1. Set k = e u in the elliptic PDE 
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(1.5) so that the coefficient K and the solution p are random variables on the probability space yfl, J-, PJ. 
Then K m j n given in (1.3) satisfies 

K min > exp ( - ||lt|| 

By Lemma 1.5 we obtain 

Hv^cxpdluH^II/Hv.. 
Since C°<* c L°°(T d ), t e (0, 1), we deduce that, 

< A'lHiijlc.*. 

Furthermore, for any e > 0, there is constant K2 = ^(e) such that exp(K\rx) < K2 exp(ex 2 ) for all x > 0. 
Thus 

||^< exp (i^ir|H|a*) ||/||^ 
< K 2 exp{e\\u\\ 2 ct )\\f\\ r v ,. 

Hence, by Theorem 2.6, we deduce that 

E||p||V<oo, i.e. peL^(fl;V) VreZ+. 

Thus, when the coefficient of the elliptic PDE is log-normal, that is k is the exponential of a Gaussian 
function, moments of all orders exist for the random variable p. However, unlike the case of the uniform 
prior, we cannot obtain exponential moments on Ecxp(a||pj|y) for any (r, a) € Z + x (0, 00). This is because 
the coefficient, whilst positive a.s., does not satisfy a uniform positive lower bound across the probability 
space. □ 

2. 5. Summary 

In the preceding three subsections we have shown how to create random functions by randomizing the 
coefficients of a series of functions. We have also studied the regularity properties of the resulting functions. 
For the uniform prior we have shown that the random functions all live in a subset of X = L°° characterized 
by the upper and lower bounds given in Theorem 2.1; denote this subset by X'. For the Besov priors we have 
shown in Theorem 2.2 that the random functions live in the Banach spaces X t,q for all t < s — d/q; denote 
any one of these Banach spaces by X' . And finally for the Gaussian priors we have shown in Theorem 2.4 
that the random function exists as an L 2 — limit in any of the Hilbcrt spaces Ti 1 for t < s — d/2. Furthermore, 
we have indicated that, by use of the Kolmogorov test, we can also show that the Gaussian random functions 
lie in certain Holder spaces; denote any of the Hlbert or Banach spaces where the Gaussian random function 
lies by X' . Thus, in all of these examples, we have created a probability measure fio which is the pushforward 
of the measure P on the i.i.d. sequence £ under the map which takes the sequence into the random function. 
This measure lives on X', and we will often write uq(X') — 1 to denote this fact. This is shorthand for saying 
that functions drawn from (i are in X' almost surely. 

3. Posterior Distribution 

3.1. Conditioned Random Variables 

Key to the development of Bayes's Theorem, and the posterior distribution, is the notion of conditional 
random variables. In this section we develop an important theorem concerning conditioning. 

Let (X, A) and (Y, B) denote a pair of measurable spaces and let v and tt be probability measures on 
X x Y. We assume that v <C 7T. Thus there exists 7r— measurable ^IxF^M with <fi £ L\ and 

-r-(x,y) = <f>(x,y). (3.1) 
air 
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That is, for (x, y) G X xY, 

E v f(x,y)=E"(<f>(x,y)f(x,y)), 

or, equivalently, 



f(x,y)u(dx,dy) = I <j)(x,y)f(x,y)Tv(dx,dy). 

XxY JXxY 

Theorem 3.1. Assume that the conditional random variable x\y exists under tt with probability distribution 
denoted tt v (dx) . Then the conditional random variable x\y underv exists, with probability distribution denoted 
by v v (dx) . Furthermore, v v <C tt v and 



— {x) = { ^ {x ^ if c(y)> ° 
dir y I 1, otherwise 



with c(y) = J x 4>(x, y)dir y (x). 



Example Let X = C([0, 1];R), Y = R. Let tt denote the measure onlxY induced by the random variable 
(w(-), w (1)) , where w is a draw from standard unit Wiener measure on K, starting from w(0) = z. 

Let n y denote measure on X found by conditioning Brownian motion to satisfy w(l) = y, thus tt v is a 
Brownian bridge measure with w(0) = z,w(l) = y. 

Assume that v -C 7r with 

dv , v 

— (x,y) = cxp ( - ®{x,y)). 

(Such a formula arises from the Girsanov theorem, for example, in the theory of stochastic differential 
equations - SDEs.) Assume further that 

sup$(x,y) = inf &(x,y) = $~(t?) 

and $+ G (0, oo) for every y G K. Then 

c(y) = / cxp ( - y))dv v (x) > cxp ( - § + {y)) > 0. 

Thus v v (dx) exists and 

^(*) = ^e^(-*(x,y)). □ 

The following lemma is useful for checking measurability. 

Lemma 3.2. Let (Z,C) be a measurable space and assume that G G C(Z;M) and that tt(Z) = 1 for some 
probability measure tt on Z . Then G is a tt— measurable function. 



3.2. B ayes' Theorem for Inverse Problems 

Let X, Y be separable Banach spaces, and G : X ^Y a, measurable mapping. We wish to solve the inverse 
problem of finding u from y where 

y = G(u) + r, (3.2) 

and 77 G Y denotes noise. We employ a Bayesian approach to this problem in which we let (u, y) G X X Y 
be a random variable and compute u\y. We specify the random variable (u,y) as follows: 

• Prior: u ~ /iq measure on X . 

• Noise: r/ ~ Qo measure on Y, and j)1m. 
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The random variable y\u is then distributed according to the measure Q u , the translate of Qo by G(u). 
We assume throughout the following that Q„ <C Qo f° r u A*o~ a.s. Thus, for some potential ^iIxf^M, 

^L( y ) = exp(-^(u;y)). (3.3) 



For given instance of the data y, <f>(it; y) is the negative log likelihood. Define i/q to be the product measure 
defined by 

is {du,dy) = Q (dy)^o{du). (3.4) 

We also assume in what follows that $(•, •) is v$ measurable. Then the random variable (u,y) E X x Y is 
distributed according to measure v[du, dy) where 

T~(u,y) = exp ( - 
We have the following infinite dimensional analogue of Theorem 1.1. 

Theorem 3.3. Bayes Theorem Assume that $ : X x Y — > K is vq measurable and that, for y Q —a.s., 

Z := [ exp(-$(u;y))n {du) > 0. (3.5) 
Jx 

Then the conditional distribution of u\y exists under v, and is denoted pi v . Furthermore fi y <C Mo and, for y 
v—a.s., 

du y 1 

^-(u) = -exp(-$(u;y)). (3.6) 

Proof. First note that the positivity of Z holds for y vq almost surely, and hence by absolute continuity of v 
with respect to vq, for y v almost surely. The proof is an application of Theorem 3.1 with 7r replaced by uq, 
4>{x, y) = exp ( — $(x, y)) and (x, y) — > (u, y). Since u^idu, dy) has product form, the conditional distribution 
of u\y under vq is simply [1q. The result follows. □ 

Remarks 3.4. In order to implement the derivation of Bayes' formula (3.6) four essential steps are required: 

• Define a suitable prior measure no an d noise measure Qo whose independent product form the reference 
measure vq. 

• Determine the potential <& such that formula (3.3) holds. 

• Show that $ is vq measurable. 

• Show that the normalization constant Z given by (3.5) is positive almost surely with respect to y ~ Qo- 

Remark 3.5. In formula (3.6) we can shift <fr(u, y) by any constant c(y), independent of u, provided the con- 
stant is finite Qo— a.s. and hence v—a.s. Such a shift can be absorbed into a redefinition of the normalization 
constant Z . 

3.3. Heat Equation 

We apply Bayesian inversion to the heat equation from Section 1.2. Recall that for G(u) = e~ A u, we have 
the relationship 

y = G(u) +n, 

which we wish to invert. Let X = H and define H* = D(Ai). Then, for u = J2 u 3 L Pji 



n f = 



|tt| y^QijUj < CO, Uj = (u,lfj)^. 



Recall from Lemma 1.3 that aj x jj so this agrees with 'H* as defined in subsection 2.4. Furthermore, we 
observe that 

H* = D(A t/2 ) = {w\w = A- t/2 w ,w G ff}. 
13 



We choose the prior fi = N(Q, A~ a ), a > |. Thus ^o(^0 = ^o(H) = 1. Indeed the analysis in subsection 
2.4 shows that /J-oiH*) = 1, t < a — |. For the likelihood we assume that rj _L u with rj ~ Q = -^(0,^4 _/3 ), 
and /3 G R. This measure satisfies Qo(%*) = 1 for f < /3 — 5 and we thus choose Y = 'H t for some i' < /3 — 5 . 
Notice that our analysis includes the case of white observational noise, for which (3 = 0. The Cameron-Martin 
Theorem, together with the fact that e~ XA commutes with arbitrary fractional powers of A, can be used to 
show that y\u ~ Q„ := N(G(u), A^ 13 ) where Q u < Q with 

■(y) = exp (- ®(u;y)), 



®(u\y) = -\\A* e A u\\-(A^e ^y,A%e *u). 

In the following we repeatedly use the fact that A^e~ XA , A > 0, is a bounded linear operator from % a to H , 
any a, 6, 7 G R. Recall that i>o(du,dy) = po(du)Qo(dy). Note that z^o(-ff x "H* ) = 1. Using the boundedness 
of ATe~ AA it may be shown that 

$ : H x "H f ' ->■ R 

is continuous, and hence fo— measurable by Lemma 3.2. 

Theorem 3.3 shows that the posterior is given by fi v where 

^~( u ) = ^ cx p(- $ ( u ;2/)). 

Z= / exp(-$(u;y))n (du), 

JH 

provided that Z > for y Qo~ a - s - Since y G "H* for any t < /3 — |, Qo~ a - s -> we have that y = A - * / 2 wo for 
some wo £ H and t' < /3 — |. Thus we may write 

= 7jll^ e_Aw l| 2 - (A^e~^w , A^e'^u). (3.7) 
Then, using the boundedness of A 1 e~ XA 1 A > 0, together with (3.7), we have 

<S>(u;y)<C{\\uf + \\w a \\ 2 ) 

where ||u>o|| is finite Qo — a -S. Thus 

Z> [ exp(-C(l+ \\w a f))^ Q (du) 

J\\u\\ 2 <l 

and, since /io(|M| 2 < 1) > (all balls have positive measure for Gaussians on a separable Banach space) the 
result follows. 



3-4- Elliptic Inverse Problem 

We consider the elliptic inverse problem from Section 1.3 from the Bayesian perspective. We consider the 
use of both uniform and Gaussian priors. Before studying the inverse problem, however, it is important to 
derive some continuiuty properties of the forward problem. Consider equation (1.5) and, define 

X+ = \v G L°°{D) ess inf v(x) > o) 
and define the map 1Z : X + — > V by 1Z(k) — p. This map is well-defined by Lemma 1.5. 
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Lemma 3.6. For i = 1,2, let 



Then 



where we assume that 



-V-(«iVjJi) = /, xeD, 
Pi = 0, a; G &D. 



bi — P2||v < -2— ll/IIHI K i - k 2 ||l= 



ess inf kUx) A ess inf /sofa:) > 0. 



Thus the function 1Z : X + — > V is locally Lipschitz. 
Proof. Let e = K\ — K%, d = p\ — p2- Then 

-V-(«iVd) = V- ((«i-« 2 )Vp 2 ), xeD 
d = 0, xedD. 

By Lemma 1.5 (applied twice) and the Cauchy-Schwarz inequality on L 2 we have 

||<2||v < ||(«2 - «l)Vp2||/«min 

< ||«2 - Ki||L^||p 2 ||y/K m in 

< -^ll/llv-llellioo. 

min 

□ 

We now study the inverse problem of finding k from a finite set of continuous linear functionals 
on V, representing measurements of p] thus lj G V* . We study both the use of uniform priors, and the use 
of Gaussian priors. We start with the uniform case, taking k = u, and we define G : X+ -> M J by 

G j( u ) = l j( n ( u ))> 3 1 >■ 

Then G(u) = (G*i(m),--- ,Gj{u)). We set X = L°°(D;R), Y = R J and consider the inverse problem of 
finding u from y where 

y = G(u) + rj 

and r\ is the noise. 
Define X' C X+ by 



X' 



\v e X — !— - rain < u(a;) < raax + - ^ 
L 1 + o 1 



a.e. x G -D 



The measure on functions from subsection 2.2, (found as the pushforward of the measure P on i.i.d. 
sequences, see subsection 2.5) is, by Theorem 2.1, a measure on X; furthermore p,o(X') = 1. We take /zo as 
the prior. 

The likelihood is defined as follows. We assume ij ~ jV(0,r), for positive symmetric V G M J><J . Thus 
Qo = N(0,T), Q u = N(G(u),T) and 

4^(3/) = exp(-$(u;3/)), 



ft) 



$( u;2/ ) = i|r-5(j/-G(u))| 2 -i|r-m 2 . 
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Recall that i/o(dy,du) = Qo(dy)no(du). G : X' — > M J is Lipschitz by Lemma 3.6 (in fact we only use that it 
is locally Lipschitz) and hence Lemma 3.2 implies that <£> : X' x Y — > K is Vq— measurable. Thus Theorem 
3.3 shows that u\y ~ [i v where 

da v 1 

( u ) = _exp(-*(u;y)) 

Z = I exp(-$(u;y))no(du), 



provided Z > for y Qo almost surely. To see that Z > note that 



Z= exp(-$(<ix;y))/i (*i), 



A"' 



since [1q(X') = 1. On X' we have that ??•(■) is bounded in 1/, and hence G is bounded in R J . Furthermore y 
is finite Qo almost surely. Thus $(w; y) is bounded by M — M(y) < oo on X' , Q almost surely. Hence 

Z > / exp(-M)fj l0 (du) = exp(-M) > 0. 

and the result is proved. 

We may use Remark 3.5 to shift $ by i|F~2y| 2 ; since this is almost surely finite under Qo and hence 
under v{du, dy) = Q u (dy)fio{du). We then obtain the equivalent form for the posterior distribution [i v : 

= | cxp ( - i|r-* {y - G(u))\ 2 ) , (3.8a) 

Z= ^cxp(-±\T-^( y -G(u))\ 2 y (du). (3.8b) 



Wc conclude this subsection by discussing the same inverse problem, but using Gaussian priors from 
subsection 2.4. We again set X = L°°(D;R), Y = R J and, for simplicity, take D = [0, l] d . Wc now take 
k = exp(u), and define G : X — >• R J by 



Gj(u) = iJll^xptu))), j = 1, . . . , J. 



We take as prior on u the measure N(0, A~ a ), from the example preceding the Fernique Theorem 2.6, with 
a > d/2. The measure /io then satisfies n{X') = 1 with X' = C(D;M.). The likelihood is unchanged by the 
prior, since it concerns y given u, and is hence identical to that in the case of the uniform prior, although 
the mean shift from Qo by Q u by G(u) now has a different interpretation. Thus we again obtain (3.8) for 
the posterior distribution (albeit with a different definition of G(u)) provided that we can establish that 



cxp 

x 



i|r-^y-G( U ))| 2 ) Mo W>0. 



To this end we use the fact that the unit ball in X' , denoted B, has positive measure, and that on this ball 
7?.(exp(u)) is bounded in V by e _1 ||/||y», by Lemma 1.5, since the infimum of n = cxp(u) is on this ball 
B. Thus G is bounded on B and, noting that y is Qo— a.s. finite, we have for some M = M(y) < oo, 



su P (i|r-^ ( y - G(u))\ 2 -I|r-^| 2 ) < 

ueB v ^ Z / 



AI. 



Hence 



Z> exp(-R)fj, (du) = exp(-R)^i (B) > 0. 
Jb 

Thusc we again obtain (3.6) for the posterior measure, now with the new definition of G, and hence 
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4. Common Structure 



In this section wc discuss various common features of the posterior distribution arising from the Bayesian 
approach to inverse problems. We start, in subsection 4.1, by studying the continuity properties of the 
posterior with respect to changes in data, proving a form of well-posedness; indeed we show that the posterior 
is Lipschitz in the data with respect to the Hcllinger metric. In subsection 4.2 we use similar ideas to study 
the effect of approximation on the posterior distribution, showing that small changes in the potential <I> lead 
to small changes in the posterior distribution, again the Hellinger metric; this work may be used to translate 
error analysis pertaining to the forward problem into estimates on errors in the posterior distribution. In 
the remaining two subsections we work entirely in the case of Gaussian prior measure (Iq- Subsection 4.3 is 
concerned with derivation and study of a Langevin equation which is invariant with respect to the posterior 
/i, and subsection 4.4 concerns MCMC methods, also invariant with respect to fj,, which exploit the structure 
of a target measure defined via density with respect to a Gaussian; in particular, the idea of using proposals 
which preserve the prior is introduced and benefits of doing so are explained. 



4-1- Well-Posedness 

In many classical inverse problems small changes in the data can induce arbitrarily large changes in the 
solution, and some form of regularization is needed to counteract this ill-posedness. We illustrate this effect 
with the inverse heat equation example. Wc then proceed to show that the Bayesian approach to inversion 
has the property that small changes in the data lead to small changes in the posterior distribution. Thus 
working with probability measures on the solution space, and adopting suitable priors, provides a form of 
regularization. 

Example Consider the heat equation introduced in subsection 1.2. Let y = e~ A u and consider data y' = 
e~ A u + r\ where r\ = etpj represents noise. Thus ||?/| = e. It is natural to apply the inverse of e~ A to y and 
to y' to understand the effect of the noise. This yields the following: 

||e^-eV|| = ||e A (y-y')ll 

= II All 

= e||e A ^|| 
= ee a >. 

Recall that, by Lemma 1.3, ctj x j 2 ^ d - Thus, for large enough j we can ensure that ctj = (a + 1) log^e^ 1 ) for 
some a > so that \\y — y'\\ = 0(e) whilst ||e^y — e A y'\\ = 0(e~ a ); the degree of ill-posedness can be made 
arbitrarily bad by choice of a arbitrarily large. □ 

Our aim in this section is to show that this ill-posedness effect does not occur in the Bayesian posterior 
distribution: small changes in the data y lead to small changes in the measure (jl v . Let X, Y be separable 
Banach spaces, and /io a measure on X. Assume that fi v <§C fiQ and that, for some $ : X x Y —> M, 

da y 1 

-f- M = ^rr cx p(- $ ( u ;y))' ( 4 - la ) 

Z(y) = f cxp (-$(«; y))no(du). (4.1b) 
Jx 

We make the following assumptions concerning $ : 

Assumptions 4.1. Let X' C X and assume that $ £ C(X' x Y;R) is Lipschitz on bounded sets. Assume 
further that there are functions Mi : R + x R + — > R + , i = 1,2, monotonic non- decreasing seperately in each 
argument, and with A/ 2 strictly positive, such that for all u £ X' , y, 2/1,2/2 € By(0,r), 

#(u;y) > -Mi(r, \\u\\ x ), 

|$(w;j/i) - 2/2)1 < M 2 (r, \\u\\ x )\\yi - y 2 ||y- 
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In order to measure the effect of changes in y on the measure [i v we need a metric on measures. We use 
the Hellinger distance defined as follows: given two measures /j, and fi' on X, both absolutely continuous 
with respect to a common reference measure v, the Hellinger distance is 



^HellO-i,/- 4 ') 



x 




dv 



In particular, if // is absolutely continuous with respect to fi then 



d H en(M,M') = ^^(l-y'^)V 



Theorem 4.2. Let Assumptions J^.l hold. Assume that /j,q(X') = 1 and that ^{X 1 PI B) > /or some 
bounded set B in X. Then, for every y G Y, Z{y) given by (4-lb) is positive and probability measure n v 
given by (4-la) is well- defined. 

Proof. Since u ~ no satisfies u E X' a.s., we have 



z(y) = 



exp | 



X' 



^{u;y))no{du). 



Note that B 1 = X' n B is bounded in X. Define 



Ri := sup < oo. 

Since $ : X 1 x Y — > M. is continuous it is finite at every point in B' x {y}. Thus, by the continuity of $(•; •) 
implied by Assumptions 4.1, we see that 

sup y) — R.2 < oo. 

(ti,s)EB'xSy(0,r) 

Hence 

Z{y)> [ cxp(-R 2 )fi (du) =exp{-R 2 )fi (B'). 

J B' 

Since Ho(B') is assumed positive and R2 is finite we deduce that Z(y) > 0. □ 

Theorem 4.3. Let Assumptions 4-1 hold. Assume that ^iq(X') = 1 and that fJ,o(X' fl B) > for some 
bounded set B in X . Assume additionally that, for every fixed r > 0, 

exp(M 1 (r,\\u\\ x ))Mi(r,\\u\\ x ) € ^ (A;R). 
Then there is C = C(r) > such that, for all y,y' £ By(0,r) 

d HM (n y ,v y ') <C\\y-y'\\ Y . 

Proof. Throughout this proof we use C to denote a constant independent of u, but possibly depending 
on the fixed value of r; it may change from occurence to occurence. We use the fact that, since M 2 {r, ■) is 
monotonic non-decreasing and since it is strictly positive on [0, 00), there is constant C > such that 

exp (Mi(r, \\u\\ x ))M 2 (r, \\u\\ x ) < Cexp (M^r, \\u\\ x ))M 2 {r, \\u\\ x )\ (4.2a) 
exp (Mi(r, |M|jc)) < Cexp (Ah(r, \\u\\ x )) M 2 (r, \\u\\ x ) 2 . (4.2b) 

Let Z = Z{y) and Z 1 = Z(y') denote the normalization constants for n v and fi v so that, by Theorem 4.2, 



Z = 
Z' = 



exp(-$(u;yf) fj, (du) > 0, 
exp(-$(M;y'))Mo(^) > 0. 



18 



Then, using the local Lipschitz property of the exponential and the assumed Lipschitz continuity of $(•; r), 
together with (4.2a), we have 

\Z-Z'\ < f |exp(-$(«;y))-exp(-$(u;y'))Md«) 

JX' 

< [ exp (Ahir, \\u\\ x )) y) - j/')lw>(*0 
exp (Mi(r, ||u||jc)iWf 2 (r J ||u|| x )po(du))) lb - 

< c( / exp(M 1 (r,||i t |U)M 2 (r,||u|U) 2 Mo(^)))||j/-y'||y 

The last line follows because the integrand is in L 1 by assumption. From the definition of Hellinger distance 
we have 

2 



< 



X' 



where 



(d H » u (/A//)) <h+h, 

1 f / 1 1 \ 2 

7 i=2 y ( e3t P(-2*( ti 51/))-exp(-2*( u 5v'))J Mo(du), 

/ 2 = - (Z')~*| 2 / exp(-^(u;v')) m (du). 
JX' 

Note that, again using similar Lipschitz calculations to those above, using the fact that Z > and Assump- 
tions 4.1, 

h < i ^ exp (Mi(r,||u|| x ))|*(u; v) - J/') I Vo(du) 



1 

< — 



'X' 

/ 1 

Also, using Assumptions 4.1, together with (4.2b), 



exp (Mi(r, \\u\\ x ))M 2 (r, \\u\\ x f ^(du) )\\y - y'\\ 2 Y 



< C\\y-y'\\ 2 



exp ( - ®(u;y'))(i {du) < / exp (Mi(r, \\u\\ x )) fi (du) 

X' JX' 

<C [ exp (Mi(r, ||«||jc))M 2 (r, |M| x )Vo(d«) 
< oo. 

Hence 

J 2 < C(Z~ 3 V (Z')- 3 )|Z - Z'| 2 <C\\y- y'\\ 2 Y . 

The result is complete. □ 

Remark 4.4. TTie Hellinger metric has the very desirable property that it translates directly into bounds on 
expectations. For functions f which are in L 2 lV (X;M.) and L 2 y , (X;M) the closeness of the Hellinger metric 
implies closeness of expectations of f. To be precise, for y,y' € By(0,r) and C — C(r), we have 

\W V f{u)-W v ' f{u)\ < Cd luu (» v ,» y ') 

so that then 

\W v f(u)-W y f(u)\ <C\\y-y'\\. 
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4-2. Approximation 



In this section we concentrate on continuity properties of the posterior measure with respect to approximation 
of the potential <£>. The methods used are very similar to those in the previous subsection, and we establish 
a continuity property of the posterior distribution, in the Hcllinger metric, with respect to small changes in 
the potential $. 

Because the data y plays no explicit role in this discussion, we drop explicit reference to it. Let X be a 
Banach space and a measure on X . Assume that \i and \i N are both absolutely continuous with respect 
to jtto and given by 

^-{u) = |exp (-$(«)), (4.3a) 
Z = I cxp(- $(u))no(du) (4.3b) 

JX' 

and 



|£(u) = ^cxp(-^(u)), (4.4a) 
Z N = f vtp(-<f> N (u))n (du) (4.4b) 

JX' 

respectively. The measure n N might arise, for example, through an approximation of the forward map G 
underlying an inverse problem of the form (3.2). It is natural to ask whether closeness of the forward map 
and its approximation imply closeness of the posterior measures. We now address this question. 

Assumptions 4.5. Let X' C X and assume that <I> £ C(X'; K) is Lipschitz on bounded sets. Assume further 
that there are functions Mi : R + — > R + , i = 1,2, independent of N and monotonic non- decreasing seperately 
in each argument, and with M 2 strictly positive, such that for all u£ X' , 

®{u;y) > -Mi(||u||x), 

|*(u) - **(u)| < M 2 (\\u\\xW(N), 

where tp(N) — > as N — > 00. 

The following two theorems are very similar to Theorems 4.2, 4.3 and the proofs are adapted to estimate 
changes in the posterior caused by changes in the potential rather than the data y. 

Theorem 4.6. Let Assumptions 4-5 hold. Assume that fj,o(X') = 1 and that /io(A' PI B) > for some 
bounded set B in X. Then Z given by (4-3b) is positive and probability measure /1 given by (4. 3a) is well- 
defined. Furthermore, for sufficiently large N , Z N given by (4-4b) is bounded below by a positive constant 
independent of N , and probability measure n N given by (4-4 a ) * s well-defined. 

Proof. Since u ~ /j,q satisfies u G X' a.s., we have 

Z= I cxp ( - $(u))/i (du). 
Jx' 

Note that B' = X' n B is bounded in X. Thus 

R\ := sup \\u\\x < 00. 
ueB' 

Since $ : X' — > R is continuous it is finite at every point in B' . Thus, by the continuity of $(■) implied by 
Assumptions 4.5, we see that 

sup = i?2 < 00. 

lies' 

20 



Hence 

Z> / exp(-R 2 )f^ Q (du) = exp(-i? 2 )Mo(B')- 



Since fio(B') is assumed positive and i? 2 is finite we deduce that Z(y) > 0. By Assumptions 4.5 we may 
choose N large enough so that 

SUp - $ N (u)\ < i?2 



so that 



sup $ JV (u) = 2R 2 < co. 



Hence 



Z w > / exp(-2i2a)A*o(d«) = exp(-2i? 2 )Mo(£')- 



Since no(B') is assumed positive and i? 2 is finite we deduce that Z N > 0. Furthermore, the lower bound is 
independent of N, as required. □ 

Theorem 4.7. Let Assumptions J^.l hold. Assume that ^lq(X') = 1 and that /j-q^X' PI B) > /or some 
bounded set B in X . Assume additionally that 

exp (M^WuWx^MKWuWx) € ^ (X;R). 

TTien i/iere is C > smc/i </«z£, for all N sufficiently large, 

d HoU (^v N )<C^(N). 

Proof. Throughout this proof we use C to denote a constant independent of u, and N; it may change from 
occurence to occurence. We use the fact that, since M 2 (-) is monotonic non-decreasing and since it is strictly 
positive on [0,oo), there is constant C > such that 

exp (Mi(||u||jO)M 2 (HM < Ccxp (MiflMljc)) M 2 (\\u\\ x j 2 , (4.5a) 
exp(M 1 (||u|U)) < Cexp(M 1 (\\u\\ x ))M 2 (\\u\\ x ) 2 . (4.5b) 

Let Z and Z N denote the normalization constants for /i and fj, so that for all TV sufficiently small, by 
Theorem 4.6, 



Z = J exp^-$(u)^ ( u (rfw) > 0, 
exp^-$ JV \ufj Ho(du) > 0, 



with lower bounds independent of N. Then, using the local Lipschitz property of the exponential and the 
assumed Lipschitz continuity of $(•), together with (4.5a), we have 



\Z-Z N \ < y^|cxp(-4>( W ))-exp(-$ w ( M ))M^) 

< / e X p(M 1 (\\u\\ x ))\^(u)-^ N (u)\ f x (du) 
J X' 

< (J i exp(M 1 (\\u\\ x ))M 2 (\\u\\ x ) l i (du))^(N) 

< c( [ exv(M 1 (\\u\\ x ))M 2 (\\u\\ x ) 2 » (du)))i>{N) 



X' 



< Ci/>(N). 
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The last line follows because the integrand is in Q by assumption. From the definition of Hellinger distance 
we have 

2 



where 



(d Hell (/^,//)) <h+h, 



h = \Z-i - (Z')- L i\ 2 I cxp(-<f N (u)) m (du) 



X' 



Note that, again using similar Lipschitz calculations to those above, using the fact (Theorem 4.6) that 
Z, Z N > uniformly in N — > oo, and Assumptions 4.5, 



1 

< — 

- Z 

1 

< — 

- Z 



exp (Mi(Hx)|$(u) - $ N (u)\ 2 u (du) 
cxp(A-/ 1 (|| w ||x))A/2(h||x) 2 Mo(^))V'W 



X' 
2 



< Cip(N)" 

Also, using Assumptions 4.5, together with (4.5b), 

exp(-$ N (u))(j, (du) < / exp(M 1 (\\u\\ x ))(J-o(du) 



X' 

<C [ exp(Afi(||«||jc))M 2 (||u||A:)%(du) 

JX' 

< CO, 

and the upper bound is independent of N. Hence 

h < C(Z-' 3 V (Z N y 3 )\Z- Z N \ 2 < CiiNf. 

The result is complete. □ 

Remark 4.8. Using the ideas underlying Remark 4-4> this result enables us to translate errors arising 
from approximation of the forward problem into errors in the Bayesian solution of the inverse problem. 
Furthermore, the errors in the forward and inverse problems scale the same way with respect to N . For 
functions f which are in L 2 ^ and L 2 ^ , uniformly with respect to N , the closeness of the Hellinger metric 
implies closeness of expectations of f : 

\Wf(u)-W N f(u)\<CiP{N). 



4-3. Measure Preserving Dynamics 

The aim of this section is to exhibit a Hilbert space valued stochastic differential equation (SDE), which in 
many applicstions has an interpretation as a stochastic partial differential equation (SPDE), and which is 
invariant with respect to the posterior measure fi y constructed in subsection 3.2. We restrict outselves to 
the case of Gaussian priors hq. The data y plays no role in what follows and indeed the theory applies to a 
wide range of measures pi which have density with respect to a Gaussian prior /xo including, but not limited 
to, Bayesian inverse problems; we work in this general setting. 

Let no — N(0,C) be a Gaussian measure on Hilbert space (H, (•, •), || ■ ||). We assume that pL <C is given 

by 

j^{u) = |exp(-$H), (4.6a) 

Z = I exp(-<$>(u))u Q (du) (4.6b) 
Ju 
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where Z £ (0, oo). We assume that $ : X — > R where X C H satisfies Ho{X) = 1. We now specify X, thereby 
linking the properties of the reference measure (Iq and the potential <I>. 
We assume that C has eigendecomposition 

Cfa = ^ (4.7) 

where {<t>j}'jLi forms an orthonormal basis for H, and where X j~ s . Necessarily s > \ since C must be 
trace-class to be a covariance on T~L. We define the following scale of Hilbert subspaces, defined for r > 0, by 

oo 

X r = {u £U\Y J P r \{^^)\ 2 < oo} 

3=1 

and then extend to superspaces r < by duality. We use || • || r to denote the norm induced by the inner- 
product 

oo 

{U,v) r =J2j 2r UjVj 
3=1 

for Uj = (it, <f>j) and Vj — (v, <f>j). Application of Theorem 2.2 with d = q = 1 shows that ^iq(X t ) = 1 for all 
r £ [0, s — 5). In what follows we will take X = X for some fixed t £ [0, s — 

Notice that we have not assumed that the underlying Hilbert space is comprised of L 2 functions mapping 
D C M. d into R, and hence we have not introduced the dimension d of an underlying physical space M d into 
either the decay assumptions on the 7j or the spaces X r . However, note that the spaces % introduced in 
subsection 2.4 are, in the case where % = L 2 (D;$l), the same as the spaces X*^. 

The aim of this section is to show that the equation 

^ = -u - CD$(u) + V2^, u(0) = uo, (4.8) 
dt dt 

preserves the measure /i, where W is a C— Wiener process, defined below. Precisely we show that uo ~ n 
thcn Ey>(u(£)) = M(p(uo) for alH > for continuous bounded <p defined on an appropriately chosen subspacc 
X of H, under boundedness conditions on 4> and its derivatives. 

In subsection 4.3.1 we introduce a family of Langevin equations which are invariant with respect to a given 
measure with smooth Lcbcsgue density. Using this, in subsection 4.3.2, we motivate equation (4.8) showing 
that, in finite dimensions, it corresponds to a particular choice of Langevin equation. In subsection 4.3.3 
we describe the precise assumptions under which we will prove invariance of measure [i under the dynamics 
(4.8). Subsection 4.3.4 describes the elements of the finite dimensional approximation of (4.8) which will 
underly our proof of invariance. Finally, subsection 4.3.5 contains statement of the measure invariance result 
as Theorem 4.19, together with its' proof; this is preceded by Theorem 4.17 which establishes existence and 
uniqueness of a solution to (4.8), as well as continuous dependence of the solution on the initial condition 
and Brownian forcing. Theorems 4.11 and 4.9 are the finite dimensional analogues of Theorems 4.19 and 
4.17 respectively and play a useful role in motivating the infinite dimensional theory. 

4-3.1. Finite Dimensional Case 

Before setting up the (rather involved) technical assumptions enquired for our proof of measure invariance, 
we give some finite-dimensional intuition. Recall that | • | denotes the Euclidean norm on R n and we also use 
this notation for the induced matrix norm on R™. We assume that 

/ £ C* 2 (R",R+), / e~ I{u) du = l. 

Thus p(u) = e^ 1 ^ is the Lebesgue density corresponding to a random variable on R™. Let \i be the 
corresponding measure. 
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Let W denote standard Wiener measure on R™. Thus B 
C([0,oo);R"). Let u £ C([0, oo); R") satisfy the SDE 



is a standard Brownian motion in 



du 



dB 



-ADIiu) + V2A-, u(0) = u 
dt 



(4.9) 



where A £ R nxra is symmetric and strictly positive definite and DI £ C 1 (W l ,M n ) is the gradient of /. 
Assume that 3M > : Vu £ R™, the Hessian of / satisfies 

\D 2 I{u)\ < M. 

We refer to equations of the form (4.9) as Langevin equations, and the matrix A as a preconditioner. 

Theorem 4.9. For every uq £ R n and W—a.s., equation (4-9) has a unique global in time solution u £ 
C([0,oo);R n ). 

Proof. A solution of the SDE is a solution of the integral equation 

u(t) = u - 

Define X = C([0, T];R") and T : X -> X by 

{Fv){t)=uo- I ADl(v{s))ds + V2AB(t). 



[ ADl(u(s))ds + V2AB(t). 
Jo 



(4.10) 



(4.11) 



Thus u £ X solving (4.10) is a fixed point of F. We show that T has a unique fixed point, for T sufficiently 
small. To this end we study a contraction property of T: 

\\{T Vl ) - (Tv 2 )\\ x = sup / (aDI( Vi {s)) -ADl(v 2 {s)))ds 

0<t<T Jo v ' 



< 



< 



ADl(vi(s)) - ADl(v 2 (s)) 



\A\M\vx(s) - v 2 (s)\ds 



ds 



< T\A\M\\v 1 -v a \\x. 

Choosing T : T\A\M < 1 shows that J 7 is a contraction on X. This argument may be repeated on succesive 
intervals [T, 2T], [2T, 3T], ... to obtain a unique global solution in C([0, oo); R"). □ 

Remark 4.10. Note that, since A is po stive- definite symmetric, its eigenvectors ej form an orthonormal 
basis for R" . We write Aej = a 2 ej . Thus 

n 

S(*)=I>(*)ei 
i=i 

where the {/3j}" = i are an i.i.d. collection of standard unit Brownian motions on R. Thus we obtain 

n 

VAB(t) = J2a j f3 j e j :=W(t). 
i=i 

We refer to W as an A- Wiener process. Such a process is Gaussian with mean zero and correlation structure 

EW(t) <£) W(s) = A(t A s). 

The equation (4.9) may be written as 



- = -ADI { u) + ^—, „(0)=«o. 



(4.12) 
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Theorem 4.11. If uq ~ fj, then u(t) ~ p for all t > 0. More precisely, for all ip : R™ — > R + bounded and 
continuous, uq ^ p implies 

Eip(u(t)) = Eip(u ), yt > 0. 
Proof. Consider the additive noise SDE, for additive noise with strictly positive-definite diffusion matrix E, 

— = /(«) + V2E— , «(0) = «o ~ vo. 
If vq has pdf poj then the Fokker-Planck equation for this SDE is 

^ = V ■(-//> + £Vp), («,i)eR"xl+, 
>o|t=o = Po- 

At time i > the solution of the SDE is distributed according to measure v(t) with density ~p(u, t) solving 
the Fokker-Planck equation. Thus the initial measure v$ is preserved if 

V • {-fp + EVp ) = 

and then p(-,t) — po, Vf > 0. 

We apply this Fokker-Planck equation to show that p is invariant for equation (4.10). We need to show 
that 

V • (ADI(u)p + AVp) = 

if p= e- I( - u K But then 

Vp = -DI{u)e~ I{u) = -DI(u)p. 

Thus 

A DI(u)p + A Vp = A DI{u)p - A DI{u)p = 0, 

so that 

V • (ADI(u)p + AVp) = V- (0) = 0. 
Hence the proof is complete. □ 

4-3.2. Motivation for Equation (4-8) 

Using the preceding finite dimensional development, we now motivate the form of equation (4.8). For (4.6) 
we have, if T-L is R n , 

p(du) = p(u)du, 
p{u) = cxp(-/(u)), 

I{u) = -\C-?u\ 2 + $(u) + In Z. 

Thus 

DI(u) =C~ 1 u + D$(u) 

and equation (4.9), which preserves p, is 

^ = -A(C- l u + D$(u)) + V2A^-. 
at 'at 

Choosing the preconditioncr A = C gives 

^ = -u-CD^u) + V2C^-. 
at at 
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This is exactly (4.8) provided W = VCB, where B is a Brownian motion with covariance I. Then W is a 
Brownian motion with covariance C. 

We provide further detail on the construction of W , using the discussion in Remark 4.10 to guide us. In 
the infinite dimensional case we define a cylindrical Wiener process by 

oo 

where is an i.i.d. family of Brownian motions on K with f3j £ C([0, oo); R). Since VC^j = 7^^, the 

C— Wiener process W = VCB is then 

OO 

W{t)=Y,"fMt)<t>r (4-13) 

3=1 

The following formal calculation gives insight into the properties of W: 

oo oo 

EW(t) ® = eQT £ 7i7kft(*)A(«)& < 
j=i fc=i 

OO OO 

j=l fe=l 

oo oo 

j=i fe=i 

oo 

= C(tAs). 

Thus the process has the covaraince structure of Brownian motionin time, and covariance operator C in 
space. Hence the name C— Wiener process. 

In order to make sense of this infinite sum we follow an approach similar to that used in Theorem 2.4 to 
make sense of Gaussian random sums. To this end, consider the finite sum 

JV 
3=1 

Let (f2, J 7 , P) denote the probability space underlying the i.i.d. sequence of unit Brownian motions used to 
construct W . 

Theorem 4.12. The sequence of functions {W^}^! is Cauchy in the Banach space Lf(fl;C([0 i T];'H t )), 
t < s — i . Thus the infinite series exists (4.13) as an L 2 — limit and takes values in C([0,T];'H*) for t < s— ^. 

We are now in a position to prove Theorems 4.17 and 4.19 which are the infinite dimensional analogues 
of Theorems 4.9 and 4.11. 



4-3.3. Assumptions on Change of Measure 

Recall that fio(X r ) = 1 for all r <E [0, s — |). The functional $(•) is assumed to be defined on X* for some 
t € [0, s — i), and indeed we will assume appropriate bounds on the first and second derivatives, building on 
this assumption. These regularity assumptions on $(•) that ensure that the probability distribution /i is not 
too different from fi , when projected into directions associated with <fij for j large. 
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For each x £ X 1 the derivative £><£>(u) is an element of the dual (X*)* of X t comprising continuous linear 
functionals on X 1 . However, we may identify (X 1 )* with X~ f and view D&(u) as an element of X~ l for each 
x £ X 1 . With this identification, the following identity holds 

IP*(«)IU(Ar«,R) = IP$(«)||_t 

and the second derivative D 2 Q(u) can be identified as an element of C(X t , X~*). To avoid technicalities 
we assume that $(■) is quadratically bounded, with first derivative linearly bounded and second derivative 
globally bounded. Weaker assumptions could be dealt with by use of stopping time arguments. 



Assumptions 4.13. There exist constants Mi £ K., i < 4 and t £ [0, s — 1/2) such that, for all u £ X 1 , the 
functional $ : X — > K satisfies 

-Afi<#(u) < M 2 (l + ||u|| 2 ); 

||2?*(«)||-t < M 3 (l + |H| t ); 

P^WIU^,*-*) < M 4 . 

Example The functional = i 1 1 ii | ] ^ satisfies Assumptions 4.13. It is defined on X 1 and its derivative 

at x £ X 1 is given by D$(u) = J2j>o j 2tu j4>j <= with ||£)$(u)||_/ = \\u\\t- The second derivative 

_D 2 <f>(w) £ C(X t ,X~ t ) is the linear operator that maps u £ X 1 to ^2j>ij 2t (u,(f>j)4>j £ X 1 ; its norm satisfies 

l|£ 2 $(w)ll£(**,*-*) = 1 for any x £ X*. □ 

Since the eigenvalues jj of C decrease as jj X j~ s , the operator C has a smoothing effect: C a h gains 2as 
orders of regularity in the sense that the A^-norm of C a h is controlled by the A' ,3_2as -norm of h £ H. Indeed 
we have the following: 

Lemma 4.14. Under Assumptions the following estimates hold: 

1. The operator C satisfies 

\\C a h\\ p -\\h\\^ 2as . 

2. The function CDQ : X * — > X is globally Lipschitz on X : there exists a constant Ms > such that 

\\CD${u) -CD$(v)\\ t <M 5 ||u-«|| t Vu,v£X t . 

3. The function F : X 1 -> X 1 defined by 

F(u) = -u-CD<$>(u) (4.14) 

is globally Lipschitz on X 1 . 
4- The functional $(•) : X t — > R satisfies a second order Taylor formula 1 . There exists a constant Mq > 
such that 

<$>(v)- ($(m) + {D$(u), v-u)) < M 6 \\u-v\\ 2 t Mu,v£X t . (4.15) 



4-3.4- Finite Dimensional Approximation 

Our analysis now proceeds as follows. First we introduce an approximation of the measure /x, denoted by 
fi N . To this end we let P N denote orthogonal projection in T-L onto X N := span{</>i, • ■ ■ , 4>n} and denote by 
Q N orthogonal projection in T-L onto X 1 - := span{0jv+i, 4>n+2, • • • }• Thus Q N = I — P N . Then define the 
measure /i N by 

-±-(u) = ^cxp(-a>(P JV U )), (4.16a) 
Z N = J exp(-^(P N u))fi (du). (4.16b) 



1 We extend (•, ■) from an inner-product on X to the dual pairing between X * and X t 
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This is a specific example of the approximating family in (4.4) if we define 

$ N = <Z>oP N . (4.17) 

Indeed if we take X = X T for any t £ (t,s — |) we see that ||-P Hrp^x) = 1 an d that, for any u £ X, 

||$(u) - $ N (u)\\ = ||*(«) - ^(F^u)!! 

<M 3 (l + || U || t )||(/-P JV H| i 
< Ci^ 3 (l + ||w||^)||w|| T Ar-C— 

Since $, and hence $ N , are bounded below by —Mi, and since the function 1 + |ju||^ is integrable by 
the Fernique Theorem 2.6, the approximation Theorem 4.7 applies. We deduce that the Hellinger distance 
between p, and p N is bounded above by 0(N~ r ) for any r < s — | — £ since r — £ G (0, s — i — £). 

We will not use this explicit convergence rate in what follows, but we will use the idea that p, N converges 
to p in order to prove invariance of the measure p under the SDE (4.8). The measure p N has a product 
structure that we will exploit in the following. We note that any element u £ "H is uniquely decomposed as 
u = p + q where p £ X N and q £ X . Thus we will write p N \du) = p N (dp, dq), and similar expressions for 
po and so forth, in what follows. 

Lemma 4.15. Define C N = P N CP N and C 1 - = Q N CQ N . Then po factors as the independent product of 
measures po,p — N(0,C N ) and pa.Q = N(0,C ) on X N and X respectively. Furthermore p itself also 
factors as an independent product on these two spaces: p N (dp,dq) = p,p(dp)p,Q(dq) with pq = po t Q and 



dp,p 
dp ,p 



(u) oc cxp ( — $(p)) . 



Proof. Because P N and Q N commute with C, and because P N Q N = Q N P N = 0, the factorization of the 
reference measure po follows automatically. The factorization of the measure p, follows from the fact that 
$ (it) = $(p) and hence does not depend on q. □ 

To facilitate the proof of the desired measure preservation property, we introduce the equation 

du M „ „ „»r _ fj . »r x rrrdW 



— = -u N -CDP N <S> N (u N ) + V2^-. (4.18) 

By using well-known properties of finite dimensional SDEs, we will show that, if u N (0) ~ p , then u N (t) ~ 
p N for any t > 0. By passing to the limit N = oo we will deduce that for (4.8), if u(0) ~ p, then u(t) ~ p 
for any t > 0. 

The next lemma gathers various regularity estimates on the functional & N (•) that are repeatedly used in 
the sequel; the follow from the analogous properties of $ by using the structure $> N = $ o P N . 

Lemma 4.16. Under Assumptions 4-13, the following estimates hold with all constants uniform in N 

1. The estimates of Assumptions 4-13 hold with $ replaced by & N . 

2. The function CD§ N : X t — > X t is globally Lipschitz on X t : there exists a constant M5 > such that 

\\CD<f> N (u) - CD$ N (v)\\ t <Ms||u-u||i y U ,v£ X 1 . 

3. The function F N : X 1 -> X* defined by 

F N (u) = -u-CP N D$ N {u) (4.19) 

is globally Lipschitz on X t . 
4- The functional $> N '(■) : X t — > M. satisfies a second order Taylor formula 2 . There exists a constant 
Mq > such that 

$ N (v)- (® N (u) + (D$ N (u),v-u)) <M 6 ||u-i;||? Vu,«e#*. (4.20) 



2 We extend {■, •) from an inner-product on X to the dual pairing between X * and X 1 
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4-3.5. Main Theorem and Proof 

We define a solution of (4.8) to be a function u £ C([0, T]; X*) satisfying the integral equation 

u{t)=u + [ F(u(s))ds + V2W{T) Vre[0,T]. (4.21) 

Jo 

The solution is said to be global if T > is arbitrary. Similarly a solution of (4.18) is a function u £ 
C([0, T]; X 1 ) satisfying the integral equation 

u n (t)=u + f F N (u N (s))ds + V2W(T) Vt£[0,T}. (4.22) 



The following establishes basic existence, uniqueness, continuity and approximation properties of the 
solutions of (4.21) and (4.22). 

Theorem 4.17. For every ug £ X and for almost every C— Wiener process W, equation (4.21) (respectively 
(4.22)J has a unique global solution. For any pair (ug, W) £ X t x C([0,T]; X 1 ) we define the ltd map 

0: X* x C([0, T];X l ) -> C([0, T];X*) 

which maps (ug, W) to the unique solution u (resp. u N for (4.22) ) of the integral equation (4.21) (resp. Q N for 
(4.22) ). The map (resp. Q N ) is globally Lipschitz continuous. Finally we have that Q N (ug, W) — > Q(ug, W) 
for every pair (u ,W) £ X t x C([0, T}; X*). 

Proof. The existence and uniqueness of local solutions to the integral equation (4.21) is a simple application 
of the contraction mapping principle, following arguments similar to those employed when studying the ltd 
map below. Extension to a global solution may be achieved by repeating the local argument on succesive 
intervals. 

Now let uW solve t 

U W = u f + [ F(u^)(s)ds + V2W {l) (T), T£[0,T], 
Jo 

for i = 1,2. Subtracting and using the Lipschitz property of F shows that e = — satisfies 
||e(r)|| t < Wu^ - 4 2) || f + L [ T \\e(s)\\ t ds + V2\\W^(t) - ^ 2 )(r)|| t 



<\\ug 1} -u { g 2) \\ t + L ||e(a)|| t ds + \^ sup \\W^ (s) - (s)\\ t 

0<s<T 



By application of the Gronwall inequality we find that 

sup || e (r)| t <C(T)(||4 1) -4 2) || t + sup \\W^( S )-W^(s)\\ t ) 



and the desired continuity is established. 

Now we prove pointwise convergence of 6" to 0. Let e = u — u N where u and u N solve (4.21), (4.22) 
respectively. The pointwise convergence of 0^ to is established by proving that e —> in C([0, T]; X f ). 
Note that 

F(u) - F N (u N ) = (F N (u) - F N (u N )) + (F(u) - F N (u)). 
Also, by Lemma 4.16, \\F N (u) - F N (u N )\\ t < L\\e\\ t . Thus we have 

||e||t<£ f T \\e(s)\\ t ds + ^ \\F(u(s))-F N (u(s))\\ t ds. 
Jo Jo 

Thus, by Gronwall, it suffices to show that 



S N := sup \\F(u(s))-F N (u(s))\\ t 

0<s<T 
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tends to zero as N — > oo. Note that 

F(u)-F N (u) = CD<I>(u)-CP N D<5>(P N u) 

= (I- P N )CD^>(u) + P N (CD^>(u) - CD<$>{P N u)) . 

Thus, since C£>$ is globally Lipschitz on X t , by Lemma 4.14, and P N has norm one as a mapping from X 1 
into itself, 

\\F(u) - F N (u)\\ t < - P N )CD$(u)\\ t + C\\(I - P N )u\\ t . 

By dominated convergence — Pf^)a\\t —> for any fixed element a £ X 1 . Thus, because CD& is globally 
Lipschitz, by Lemma 4.14, because u € C([0,T]; X f ), we deduce that it suffices to bound sup 0<s<T ||u(s)|j t . 
But such a bound is a consequence of the existence Theorem 4.17. □ 
The following is a straightforward corollary of the preceding theorem: 

Corollary 4.18. For any pair (u , W) £ X 1 x C([0,T]; X 1 ) we define the point ltd map 



9 t : X t x C([0,T];Af t ) -> Af< 

which maps (uq,W) to the unique solution u(t) of the integral equation (4.21) (resp. u N (t) for (4.22),) at 
time t (resp. 0f for (4.22) j. 77ie map 0t (Vesp. ) is globally Lipschitz continuous. Finally we have that 
6f (uo,W) -> Q t (uo,W) for every pair (u ,W) € X* x C([0, T]; X*). 

Theorem 4.19. Let Assumptions 4-. 13 hold. Then the measure [i given by (4.3) is invariant for (4.8): for 
all continuous bounded functions tp : X — > R it follows that, if E denotes expectation with respect to the 
product measure found from initial condition uq ~ [i and W ~ W, the C— Wiener measure on X 1 , then 
mp{u{t)) = E<p{uo). 

Proof. We have that 

E(p(u(t)) = ( <p(Q t (uo,W))[i(duo)W(dW), (4.23) 



E^(u ) = / <p(uo)(j,(du ). (4.24) 
If we solve equation (4.18) with uq ~ fj, N then, using E^ with the obvious notation, 

E N <p(u N (t)) = f <p(e? K,^)) M A, (d Uo )w(diy), (4.25) 



E N p(uo) = J tp(u )n N (duo). (4.26) 

Lemma 4.20 below shows that, in fact, 

E N ip(u N (t)) = E N (p(u ). 

Thus it suffices to show that 

E N (p(u N (t))->mp(u(t)) (4.27) 

and 

E N <p(u ) -> E<p(u ). (4.28) 
Both of these facts follow from the dominated convergence theorem as we now show. First note that 

E N ^(u Q ) = [ ip(u )e-^ pNu ^Mdu ). 



Since f(-)e * oP is bounded independently of N, by (sup ip)e Ml , and since ($ o P N )(u) converges pointwise 
to $(u) on X f , we deduce that 

E N v(u ) J ^(u )e-* (uo Vo(d« ) = %M 
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so that (4.28) holds. The convergence in (4.27) holds by a similar argument. From (4.29) we have 

E N ip(u N (t)) = f ip(e?{u ,W))e-^ pNuo *>n (du Q )W(dW). (4.29) 



The integrand is again dominated by (sup ip)e Ml . Using the pointwise convergence of to 9t on Af* x 
C([0, T]; X l \ as proved in Corollary 4.18, as well as the pointwise convergence of ($ o P N )(u) to $(u), the 
desired result follows from dominated convergence: we find that 

E N (p(u N (t))-> J if(e t (u ,W))e~' s ' {uo) ^o{duo)W(dW) = E<p(u(t)). 

The desired result follows. □ 

Lemma 4.20. Let Assumptions J^.13 hold. Then the measure /i N given by (4.16) is invariant for (4.18): 
for all continuous bounded functions ip : X t — > M it follows that, if E N denotes expectation with respect to 
the product measure found from initial condition u$ ~ /j, N and W ~ W, the C— Wiener measure on X 1 , then 
E N (p(u N {t)) = E N tp{u ). 

Proof. Recall from Lemma 4.15 that measure (x N given by (4.16) factors as the independent product of two 
measures on /ip on X N and /iq on X- 1 . On X 1 - the measure is simply the Gaussian [iq = J\f(Q,C ), whilst 
X N the measure fip is finite dimensional with density proportional to 

cxp(-$(p)-I||( C JV )-^|| 2 ). (4.30) 

The equation (4.18) also decouples on the spaces X N and X 1 - . On X it is simply 

S--t+vaj*f (4. 31) 

whilst on X N it is 

^ = -p-C N D<f(p) + V2P N ^-. (4.32) 
dt dt 

Measure /j,q is preserved by (4.31), because (4.31) simply gives an Ornstein-Uhlenbeck process with desired 

Gaussian invariant measure. On the other hand, equation (4.32) is simply a Langevin equation for measure on 

R* with density (4.30) and a calculation with the Fokker-Planck equation, as in Theorem 4.11, demonstrates 

the required invariance of [ip . □ 



4.4. MCMC Methods 



The perspective that we have described on inverse problems leads to new sampling methods which are 
specifically tailored to the infinite dimensional setting, and its approximation by finite dimensional measures. 
In particular it leads naturally to the design of algorithms which perform well under refinement of the finite 
dimcnsionalization. To illustrate this idea we consider the setting of Section 4.4 and study random walk type 
algorithms. 

First of all we describe the standard Random Walk Metropolis (RWM) algorithm, designed to sample 
a measure on R N . To this end we notice that the measure fi given by (4.16) factors as the product of 
two independent measures on X N and T~L\X N . The measure on T~L\X N is given by the prior and is easily 
sampled. Thus it remains to sample the measure on X N . This space is isomorphic to M. N . We define 

I{u) = <S>{u) + ^\\C-^-u\\ 2 . (4.33) 

Then, for u G X N , the measure of interest has Lebesgue density 

tt n (u) ex exp(— I(u)). 

This standard RWM algorithm defines a Markov chain {u k } on X N as follows. 
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• Set k = and Pick e l". 

• Propose = ti< fe ) + (3P N ^ k \ ^ ~ iV(0,C). 

• Set u( fe+1 ' = i/ fc ) with proability a(u^ k \v^), independently of u^ k \^ k \ 

• Set u( fe+1 ' = vS k ' otherwise. 

• k ->• fc + 1. 

Here 

a(u, i>) = min{l, exp(J(it) — I(v))}. 

This Markov chain leaves the density ir N as defined above invariant. It is, however, badly behaved in the 
limit N — > oo. This is because 

lim I(P N u) = oo 

almost surely for u ~ /x. 

To overcome this issue we introduce a new RWM algorithm which is defined on the whole of "H, not just 
on finite truncations. The algorithm is defined as follows, when applied on X N : 

• Set k = and Pi ck £ X N . 

• Propose «W = - /3 2 )u^ + /3P N ^ k \ f( fc ) ~ iV(0,C). 

• Set u' fc+1 ) = z/ fc ) with proability a(u^- k \v^) : independently of and ^ k \ 

• Set u' fc+1 ' = otherwise. 

• k -> fc + f . 



Here 



a(u 7 v) = min{l, exp($(u) — 



Notice that the small change in proposal, when compared with the standard RWN, results in an acceptance 
probability defined via differences of $ and not /. Because $ is a.s. finite with respect to fj,, whilst I is 
not, this leads to a considerably improved algorithm which has desirable N— independent properties when 
implemented on a sequence of approximating problems with N — > oo. 

To quantify this it is useful to introduce the concept of spectral gap. Define the spaces 

Ll = {f:X^R:\\f\\i:=W\f(u)\ 2 <^}, 

- 2 



Define the Markov kernel 
Then set 



Li = If e K ■■ = o.} 
(p/)( u ) = e(/( u ( 1 



\p\\a-,a ■= ^p 11 



feLl \\T\\2 



PJ\\ 2 
/II 



We have L^— spectral gap 7 if ||-P||l2^ L 2 < I — 7. Clearly 7 e (0, 1). Furthermore, the bigger 7 the better 
the performance of the algorithm. 

The following theorem quantifies the benefits of the new RWM algorithm over the standard one. 

Theorem 4.21. For the standard RWM algorithm: 

• // /3 = N~ a with a £ [0, 1) then the spectral gap is bounded above by C p N~ p for any positive integer p. 

• If P = N~ a with a G [l,oo) then the spectral gap is bounded above by CN~?. 

Hence spectral gap is bounded above by CN~i . For the new RWM algorithm the spectral gap is bounded 
below independently of N . Hence we have a central limit theorem and, for u' ) ~ fx and C independent of N, 



\Kf-^- 

fe=i 



K „ 2 

< CK" 1 . 



E v \^Yj(uW)-W N f 
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5. Bibliographical Notes 



• Subsection 1.1. See [BS94] for a general overview of the Bayesian approach to statistics in the finite 
dimensional setting. The Bayesian approach to linear inverse problems with Gaussian noise and prior 
in finite dimensions is discussed in [StulO, Chapters 2 and 6] and, with a more algorithmic flavour, in 
the book [KS05]. 

• Subsection 1.2, 1.3. See [Eva98] for theory relevant to both the heat equation and the elliptic equation. 
For more detail on the heat equation as an ODE in Hilbcrt space, see [Paz83, Lun95]. For further reading 
on severely ill-posed problems see [StulO, Chapters 3 and 6], [KvdVvZllb], [ASZ12]; for linear inverse 
problems in infinite dimensions see [StulO, Chapters 3 and 6], [ALS12], [Man84], [LPS89], [KvDVvZlla]; 
for the elliptic inverse problem - determining the permeability from the pressure in a Darcy model of 
flow in a porous medium and obtaining bounds on the solution using Lax-Milgram theorem [Ric81, 
DS11]; for the inverse heat equation, see [Kir96, EHN96]. 

• Subsection 2.1. For general discussion of the properties of random functions constructed via random- 
ization of coefficients in a series expansion see [Kah85]. 

• Subsection 2.2. These uniform priors have been extensively studied in the context of the field of Un- 
certainty Quantification and the reader is directed to [CDS10, CDS12] for more details. Uncertainty 
Quantification in this context docs not concern inverse problems, but rather studies the effect, on 
the solution of an equation, of randomizing the input data. Thus the interest is in the pushforward 
of a measure on input parameter space onto a measure on solution space, for a differential equation. 
Recently, however, these priors have been used to study the inverse problem; see [SS12]. 

• Subsection 2.3. Besov priors were introduced in the paper [LSS09] and Theorem 2.2 is taken from that 
paper. We notice that the theorem constitutes a special case of the Fernique Theorem in the Gaussian 
case q = 2; it is restricted to a specific class of Hilbert spafe norms, however, whereas the Fernique 
Theorem in full generality applies in all norms on Banach spaces which have full Gaussian measure. A 
more general Fernique- like property of the Besov measures is proved in [DHS12] but it remains open 
to determine the appropriate complete generalization of the Fernique Theorem to Besov measures. 

• Subsection 2.4. The general theory of Gaussian measures on Banach spaces is contained in [Lif95, 
Bog98]. The text [DZ92], concerning the theory of stochastic PDEs, also has a useful overview of 
the subject. The Karhunen-Loeve expansion (2.7) is contained in [Adl81]. The informal calculation 
concerning the covariancc operator of the Gaussian measure which follows Theorem 2.4 may be proved 
using characteristic functions; see, for example, Proposition 2.18 in [DZ92]. All three texts include 
statement and proof of the Fernique Theorem in the generality given here. The Kolmogorov continuity 
theorem is discussed in [DZ92] and [Adl90]. Proof of Holder regularity adapted to the case of the 
periodic setting may be found in [Hai09] and [StulO, Chapter 6]. For further reading on Gaussian 
measures see [DP06]. 

• Subsection 3.1. Theorem 3.1 is taken from [HSVW05] where it is used to compute expressions for the 
mcausurc induced by various conditionings applied to SDEs. The Example following Theorem 3.1, 
concerning end-point conditioning of measures defined via a density with respect to Wiener measure, 
finds application to problems from molecular dynamics in [PS10, NST]. Further material concern- 
ing the equivalence of posterior with respect to the prior may be found in [StulO, Chapters 3 and 
6], [ALS12], [ASZ12]. The equivalence of Gaussian measures is studied via the Feldman-Hajek theorem; 
see [DPZ92] and [DZ92]. 

• Subsection 3.2. General development of Bayes' Theorems for inverse problems on function space, along 
the lines described here, may be found in [CDRS09, StulO]. The reader is also directed to the papers 
[Las02, Las07] for earlier related material, and to [Lasll, Lasl2a, Lasl2b] for recent developments. 

• Subsection 3.3. The inverse problem for the heat equation was one of the first infinite dimensional 
inverse problems to receive Bayesian treatment; see [Fra70]. The problem is worked through in detail 
in [StulO]. To fully understand the details the reader will need to study the Cameron-Martin theorem 
(concerning shifts in the mean of Gaussian measures) and the Feldman-Hajek theorem (concerning 
equivalence of Gaussian measures); both of these may be found in [DZ92, Lif95, Bog98] and are also 
discussed in [StulO]. 

• Subsection 3.4. The elliptic inverse problem with the uniform prior is studied in [SS12]. A Gaussian 
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prior is adopted in [DS11], and a Besov prior in [DHS12]. 

• Subsection 4.1. Relationships between the Hellinger distance on probability measures, and the Total 
Variation distance and Kullback-Leibler divergence may be found in [GS02], [Pol]. 

• Subsection 4.2. The relationship between expectations and Hellinger distance, as used in Remark 4.8, 
is discussed in [StulO]. 

• Subsection 4.3 concerns measure preserving continuous time dynamics. The finite dimensional aspects 
of this subsection, which we introduce for motivation, are covered in the texts [Oks03] and [Gar85] ; the 
first of these books is an excellent introduction to the basisc existence and uniqueness theory, outlined in 
a simple case in Theorem 4.9, whilst the second provides an in depth treatment of the subject from the 
viewpoint of the Fokkcr-Planck equation, as used in Theorem 4.11. This subject has a long history which 
is overviewed in the paper [HSV07] where the idea is applied to fiding SPDEs which are invariant with 
respect to the measure generated by a conditioned diffusion process. This idea is generalized to certain 
conditioned hypoelliptic diffusions in [HSVllb]. It is also possible to study deterministic Hamiltonian 
dynamics which preserves the same measure. This idea is described in [BPSSS11] in the same set-up 
as employed here; that paper also contains references to the wider literature. Lemma 4.14 is proved in 
[MPS 12]. Lemma 4.20 requires knowledge of the invariance of Ornstein-Uhlcnbeck processes together 
with invariance of finite dimensional first order Langevin equations with the form of gradient dynamics 
subject to additive noise. The invariance of the Ornstein-Uhlenbeck process is covered in [DPZ96] and 
invariance of finite dimensional SDEs using the Fokkcr-Planck equation is discussed in [Gar85]. The 
C— Wiener process, and its properties, are described in [DZ92]. 

• Subsection 4.4 concerns The standard RWM was introduced in [MRTT53] and led, via the paper 
[Has 70], to the development of the more general class of Metropolis-Hastings methods. MCMC meth- 
ods which are invariant with respect to the target measure /j,. The paper [CRSW12] overviews this 
subject area, including the new RWM method. The specific idea of the new RWM is contained in 
the unpublished paper [Nea98], equation (15). The paper [Tie98] is a key reference which provides a 
framework for the study of Metropolis-Hastings methods on general state spaces, and may be used 
to establish that the new RWM method is well-defined on the Hilbert space H. Theorem 4.21 is a 
summary of the results in the paper [HSVlla]. 

Acknowledgements AMS is grateful to Sergios Agapiou and to Yuan-Xiang Zhang for help in the prepa- 
ration of these lecture notes. He is also grateful to EPSRC, ERC and ONR for financial support. 
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